Background

This module builds on code contained in Coronavirus_Statistics_USAF_v007.Rmd. This file includes the latest code for analyzing data from USA Facts. USA Facts maintains data on cases and deaths by county for coronavirus in the US. Downloaded data are unique by county with date as a column and a separate file for each of cases, deaths, and population.

The intent of this code is to move updated functions to sourcing files and to better manage memory.

Sourcing Functions

The tidyverse library is loaded, and the functions used for CDC daily processing are sourced. Additionally, specific functions for USA Facts are also sourced:

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.0 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
# Functions are available in source file
source("./Generic_Added_Utility_Functions_202105_v001.R")
source("./Coronavirus_CDC_Daily_Functions_v002.R")
source("./Coronavirus_USAF_Functions_v002.R")

Further, the mapping file specific to USA Facts is sourced:

source("./Coronavirus_USAF_Default_Mappings_v002.R")

Data Updates

The latest county-level burden data are downloaded:

readList <- list("usafCase"="./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20230208.csv", 
                 "usafDeath"="./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20230208.csv"
                 )
compareList <- list("usafCase"=readFromRDS("cty_newdata_20230108")$dfRaw$usafCase, 
                    "usafDeath"=readFromRDS("cty_newdata_20230108")$dfRaw$usafDeath
                    )

# Use existing clusters
cty_newdata_20230208 <- readRunUSAFacts(maxDate="2023-02-06", 
                                        downloadTo=lapply(readList, 
                                                          FUN=function(x) if(file.exists(x)) NA else x
                                                          ),
                                        readFrom=readList, 
                                        compareFile=compareList, 
                                        writeLog="./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log", 
                                        ovrwriteLog=TRUE,
                                        useClusters=readFromRDS("cty_newdata_20210813")$useClusters,
                                        skipAssessmentPlots=FALSE,
                                        brewPalette="Paired"
                                        )
## 
## No file has been downloaded, will use existing file: ./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20230208.csv
## Rows: 3193 Columns: 1115
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr    (3): County Name, State, StateFIPS
## dbl (1112): countyFIPS, 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-25, 2020...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## 
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS 
## 
## 
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation ideoms with `aes()`

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 34
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log
## 
## Checking for similarity of: county
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 0 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log
## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 0 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log
## Rows: 3193 Columns: 1115
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr    (3): County Name, State, StateFIPS
## dbl (1112): countyFIPS, 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-25, 2020...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## 
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS 
## 
## 
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 34
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log
## 
## Checking for similarity of: county
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 0 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log
## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 0 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 × 4
##   isType    cases     new_cases            n
##   <chr>     <dbl>         <dbl>        <dbl>
## 1 before 4.90e+10 97284771      3547423     
## 2 after  4.84e+10 95083869      3490762     
## 3 pctchg 1.20e- 2        0.0226       0.0160
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 × 4
##   isType  deaths   new_deaths            n
##   <chr>    <dbl>        <dbl>        <dbl>
## 1 before 6.74e+8 1082388      3547423     
## 2 after  6.46e+8 1002861      3490762     
## 3 pctchg 4.16e-2       0.0735       0.0160
## Warning: Using `all_of()` outside of a selecting function was deprecated in tidyselect
## 1.2.0.
## ℹ See details at
##   <https://tidyselect.r-lib.org/reference/faq-selection-context.html>
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.

## NULL

# Plot all counties based on closest cluster
sparseCountyClusterMap(cty_newdata_20230208$useClusters, 
                       caption="Includes only counties with 25k+ population",
                       brewPalette="viridis"
                       )

# Save the refreshed file
saveToRDS(cty_newdata_20230208, ovrWriteError=FALSE)

Vaccines data are also updated:

cty_vaxdata_20230209 <- processCountyVaccines(loc="./RInputFiles/Coronavirus/county_vaccine_20230209.csv", 
                                              ctyList=readFromRDS("cty_newdata_20230208"), 
                                              minDateCD=c("2022-06-09", "2022-06-09"),
                                              maxDateCD="2023-01-26"
                                              )
## Rows: 414347 Columns: 80
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): Date, FIPS, Recip_County, Recip_State, SVI_CTGY, Metro_status
## dbl (74): MMWR_week, Completeness_pct, Administered_Dose1_Recip, Administere...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## 
## Records from other than 50 states and DC:
## # A tibble: 9 × 2
##   state     n
##   <chr> <int>
## 1 AS      126
## 2 FM      127
## 3 GU      252
## 4 MH      126
## 5 MP      126
## 6 PR     9969
## 7 PW      126
## 8 VI      506
## 9 <NA>     81

## Warning: Removed 16 rows containing non-finite values (`stat_boxplot()`).

## Warning: Removed 16 rows containing non-finite values (`stat_boxplot()`).

## Warning: Removed 16 rows containing non-finite values (`stat_boxplot()`).
## 
## Count of NA records by column
##           state            FIPS popgte65_minpop popgte65_maxpop    popgte65_nnA 
##               0               0               0               0               0 
##               n 
##               0 
## 
## Records where minimum and maximum population differ# A tibble: 0 × 5
## # … with 5 variables: state <chr>, FIPS <chr>, age <chr>, minpop <dbl>,
## #   maxpop <dbl>
## 
## 
## 
## Will run with parameters:
## burdenVar: cpm dpm 
## vaxVar: vxcpoppct vxcpoppct 
## minDateCD: 2022-06-09 2022-06-09 
## maxDateCD: 2023-01-26 2023-01-26
## Warning: Using `all_of()` outside of a selecting function was deprecated in tidyselect
## 1.2.0.
## ℹ See details at
##   <https://tidyselect.r-lib.org/reference/faq-selection-context.html>

## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 16 rows containing non-finite values (`stat_smooth()`).
## Warning: The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## Warning: Removed 16 rows containing missing values (`geom_point()`).

## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 16 rows containing non-finite values (`stat_smooth()`).
## Warning: The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## Warning: Removed 16 rows containing missing values (`geom_point()`).

## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
## 
## Weighted Residuals:
##        Min         1Q     Median         3Q        Max 
## -313131506   -1913551     266283    2800223  168848464 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28983.59    3272.65   8.856  < 2e-16 ***
## vaxMetric     156.60      50.59   3.096  0.00198 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11020000 on 3124 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.003058,   Adjusted R-squared:  0.002739 
## F-statistic: 9.584 on 1 and 3124 DF,  p-value: 0.001981
## 
## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric, 
##     data = dfReg, weights = pop)
## 
## Weighted Residuals:
##        Min         1Q     Median         3Q        Max 
## -313079816   -2168552     -27943    2554658  168198528 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## type<25k                34129.38   11883.21   2.872 0.004105 ** 
## type>500k               18078.44    7005.12   2.581 0.009904 ** 
## type100k-500k           26691.52    6982.55   3.823 0.000135 ***
## type25k-100k            30691.16    7874.97   3.897 9.93e-05 ***
## vaxMetric:type<25k        139.46     239.27   0.583 0.560040    
## vaxMetric:type>500k       305.47      99.28   3.077 0.002111 ** 
## vaxMetric:type100k-500k   188.19     112.16   1.678 0.093467 .  
## vaxMetric:type25k-100k    138.08     148.62   0.929 0.352936    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11020000 on 3118 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.5674, Adjusted R-squared:  0.5662 
## F-statistic: 511.1 on 8 and 3118 DF,  p-value: < 2.2e-16
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 16 rows containing non-finite values (`stat_smooth()`).
## Warning: The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## Warning: Removed 16 rows containing missing values (`geom_point()`).

## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 16 rows containing non-finite values (`stat_smooth()`).
## Warning: The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## Warning: Removed 16 rows containing missing values (`geom_point()`).

## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -3701058   -22105     2245    34856   777649 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 486.4319    35.9334  13.537   <2e-16 ***
## vaxMetric    -5.1424     0.5554  -9.258   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 121000 on 3124 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.0267, Adjusted R-squared:  0.02639 
## F-statistic: 85.71 on 1 and 3124 DF,  p-value: < 2.2e-16
## 
## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric, 
##     data = dfReg, weights = pop)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -3646748   -29301    -6242    25901   766646 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## type<25k                 387.020    129.649   2.985 0.002857 ** 
## type>500k                301.573     76.428   3.946 8.13e-05 ***
## type100k-500k            259.678     76.181   3.409 0.000661 ***
## type25k-100k             420.757     85.918   4.897 1.02e-06 ***
## vaxMetric:type<25k        -1.708      2.610  -0.654 0.513066    
## vaxMetric:type>500k       -2.941      1.083  -2.715 0.006664 ** 
## vaxMetric:type100k-500k   -1.289      1.224  -1.053 0.292391    
## vaxMetric:type25k-100k    -2.779      1.622  -1.714 0.086621 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 120300 on 3118 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.1849, Adjusted R-squared:  0.1828 
## F-statistic: 88.41 on 8 and 3118 DF,  p-value: < 2.2e-16
# Save the refreshed file
saveToRDS(cty_vaxdata_20230209, ovrWriteError=FALSE)

County-level data are post-processed:

cty_postdata_20230208 <- postProcessCountyData(lstCtyBurden=cty_newdata_20230208$dfPerCapita, 
                                               lstCtyVax=cty_vaxdata_20230209$vaxFix, 
                                               lstState=readFromRDS("cdc_daily_230202")$dfPerCapita, 
                                               excludeStates="AK"
                                               )
## 
## Parameter maxDate is: 2023-02-01
## Warning: Using `all_of()` outside of a selecting function was deprecated in tidyselect
## 1.2.0.
## ℹ See details at
##   <https://tidyselect.r-lib.org/reference/faq-selection-context.html>

Additional post-processing steps are run:

# Step 1a: Burden comparisons for aggregated states
additionalCountyPostProcess(cty_postdata_20230208, p1CompareStates=c(state.abb, "DC"), p1AggData=TRUE)
## Warning: Using `all_of()` outside of a selecting function was deprecated in tidyselect
## 1.2.0.
## ℹ See details at
##   <https://tidyselect.r-lib.org/reference/faq-selection-context.html>
## Warning: Removed 6 rows containing missing values (`geom_line()`).

# Step 1: Burden aggregation for key states
# Step 2: vaccine comparisons
# Step 3: Scoring updates (and errors)
# Step 4: New rolling data (28-day default with ceilings 50000 CPM, 500 DPM)
additionalCountyPostProcess(cty_postdata_20230208, 
                            p1CompareStates=c("GA", "FL", "NE", "IL", "OR"), 
                            p2VaxStates=c("MA", "HI", "VA", "VT", "RI", "NE"), 
                            p3VaxTimes=sort(c("2022-01-01", "2023-01-25")),
                            p4DF=cty_newdata_20230208$dfPerCapita, 
                            excludeStates=c("AK")
                            )
## Warning: Removed 6 rows containing missing values (`geom_line()`).

## Warning: Removed 6 rows containing missing values (`geom_line()`).

## Warning: Removed 6 rows containing missing values (`geom_line()`).

## Warning: Removed 6 rows containing missing values (`geom_line()`).

## Warning: Removed 6 rows containing missing values (`geom_line()`).

## Warning: Removed 379 rows containing missing values (`geom_line()`).